Optical Character Recognition - IMPACT Best Practice Guide
نویسنده
چکیده
Background and developments to date .................................................................................... 1 How OCR works ................................................................................................................ 4 Best Practice in the Use of OCR ........................................................................................... 6 Avoiding problems in OCR .................................................................................................. 8 Negative factors resulting from the source material ........................................................... 8 Negative factors resulting from the image capture process ........................................................ 12 Narrow binding ........................................................................................................ 12 Image not cropped .................................................................................................... 12 Image skew ............................................................................................................. 12 Factors resulting from image enhancement techniques ............................................................. 14 Bitonal output reduces readability ................................................................................ 14 Under-correction of processing software for poor quality scan ........................................... 15 Implementing OCR ........................................................................................................... 16 Sample costs of OCR ................................................................................................ 16 Table of OCR page prices calculated by TELplus content-providing partners ........................ 16 Evaluation and Quality Assurance of OCR Results ......................................................... 17 Conclusion and further developments ................................................................................... 17 Automatic post correction .......................................................................................... 18 Cooperative Correction .............................................................................................. 18
منابع مشابه
The Search for Underlying Principles of Health Impact Assessment: Progress and Prospects; Comment on “Investigating Underlying Principles to Guide Health Impact Assessment”
Health Impact Assessment (HIA) is a relatively young field of endeavour, and hence, future progress will depend on the planning, implementation and rigorous evaluation of additional HIAs of projects, programmes and policies the world over. In the June 2014 issue of the International Journal of Health Policy and Management, Fakhri and colleagues investigated underlying principles of HIA through ...
متن کاملMeasuring the impact of character recognition errors on downstream text analysis
Noise presents a serious challenge in optical character recognition, as well as in the downstream applications that make use of its outputs as inputs. In this paper, we describe a paradigm for measuring the impact of recognition errors on the stages of a standard text analysis pipeline: sentence boundary detection, tokenization, and part-of-speech tagging. Employing a hierarchical methodology b...
متن کاملOnline Chinese Handwritten Character . . .
Online Chinese handwriting recognition has attracted much research attention recently due to its complexity, wide-spread applications and emerging market demands. This material serves as a guide for pattern recognition researchers who have limited or no background in this language. We provide a brief review of the nature of the problem and challenges of online Chinese handwritten character reco...
متن کاملEnhanced Good-Turing and Cat.Cal: Two New Methods for Estimating Probabilities of English Bigrams (abbreviated version)
For many pattern recognition applications including speech recognition and optical character recognition, prior models of language are used to disambiguate otherwise equally probable outputs. It is common practice to use tables of probabilities of single words, pairs of words, and triples of words (n-grams) as a prior model. Our research is directed to 'backing-off' methods, that is, methods th...
متن کاملOCR and post-correction of historical Finnish texts
This paper presents experiments on Optical character recognition (OCR) as a combination of Ocropy software and data-driven spelling correction that uses Weighted Finite-State Methods. Both model training and testing were done on Finnish corpora of historical newspaper text and the best combination of OCR and post-processing models give 95.21% character recognition accuracy.
متن کامل